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This handbook provides details on functionality, data structures and best practices for BOLD version 3.6. It explains how to use this 
system to collect, manage and publish Barcode and ancillary data. It also provides details on the integrated analytical tools. At any 
time while using BOLD, you can access the online documentation by clicking on the “Get Help’’ link in the footer of every page, or by 
selecting “Documentation’’ from the page header. 

For assistance with any feature of BOLD, please email the BOLD Support Team: support@boldsystems.org 
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BOLD 


Introduction 


The Barcode of Life Data Systems (BOLD), established in 2005, is a web platform that provides an integrated 
environment for the assembly and use of DNA barcode data. It delivers an online database for the collection and 
management of specimen, distributional, and molecular data as well as analytical tools to support their validation. 
Over the past few years, BOLD has grown to become a powerful online workbench and the central informatics hub of 
the DNA barcoding community. 

BOLD is freely available to any researcher with interests in DNA Barcoding. By providing specialized services, it 
aids in the publication of records that meet the standards needed to gain BARCODE designation in the international 
nucleotide sequence databases. Due to its web-based delivery and flexible data security model, it is also well positioned 
to support projects that involve broad research alliances. 

BOLD version 3.6 was released in October 2013. This update addresses a lot of usability gaps from the prior version and 
has been built from usage models of novice and expert users. It additionally introduces new functionality to further 
enable investigation and retrieval of barcode data. Find new functionality highlighted throughout this Handbook. 
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Public Data Portal: 

A data retrieval interface that allows for searching 
over 1.7M public records in BOLD using multiple 
search criteria including, but not limited to, 
geography, taxonomy, and depository. 


DNA Barcode Education Portal: 

A custom platform for educators and students to 
explore barcode data and contribute novel 
barcodes to the BOLD database. 




& 


Barcode Index Numbers: 

A searchable database of Barcode Index Numbers 
(BINs), sequence clusters that closely 
approximate species. 


Workbench: 

An integrated data collection and analysis 
environment that securely supports the assembly 
and validation of DNA barcodes and ancillary 
sequences. 


www.boldsystems.org 


Available from the BOLD home page, the DNA Barcode Education Portal (BOLD-SDP) is an educational module released in 2013. It supports the 
assembly and analysis of DNA barcode data by students in a classroom environment. Please find more information at www.boldsystems.org/edu 
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Getting Started on BOLD 


The BOLD interface is 
designed to provide easy 
access to commonly used 
features and databases. 

From the top menu, users 
can access the public 
databases, the taxonomy 
browser, the identification 
engines, the user workbench 
and general resources. 
In addition, the footer 
provides access to details 
on the BOLD organization, 
community and partner 
sites. 

Please see the diagram and 
table below for a description 
of the navigation features. 
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.1 SYSTEMS Databases Taxonomy | Identification Workbench | Resources 


Publication Search 
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2! 1- Fossil avian eggshell preserves ancient DNA 

Authors : Charlotte L Oskaml, James Haile2,3, Emma McLayl, Paul Rigby4, Morten E. Allentoftl.5, Maia E. Olsen3, Camilla Bengtsson3, Gifford H. Miller6,7, Jean-Luc 

Schwenninger8, Chris Jacomb9, Richard Walter9, Alexander BayneslO, Joe Dortchll, Michael Parker-Pearsonl2, M. Thomas P. Gilbert3, Richard N. HoldawayS, 
Eske Willerslev3 and Michael Buncei,* 

journal : Proceedings of the Royal Society, Biological Sciences; 2010; 277(ln Press): 1991-2000 
□ 2. Molecular evidence for the identity of the Magenta petrel 

Authors : Lawrence HA, Millar CD, Imber MJ, Crockett DE, Robins JH, Scofield RP, Taylor GA, Lambert DM. 

Journal : Molecular Ecology Resources; 2009; 9(2): 458-461 

A lone petrel was shot from the decks of an Italian warship (the 'Magenta*) while it was sailing the South Pacific Ocean in 1867, far from land. The species, unknown to science, was named the 
'Magenta petrel' (Procellariiformes, Procellariidae, Pterodroma magentae). No other specimens of this bird were collected and the species it represented remained a complete enigma for over 100 
years. We compared DNA sequence of the mitochondrial cytochrome b gene from the Magenta petrel to that of other petrels using phylogenetic methods and ancient DNA techniques. Our results 
strongly suggest that the Magenta petrel specimen is a Chatham Island taiko. Furthermore, given the collection location of the Magenta petrel, our finding indicates that the Chatham Island taiko 
forages far into the Pacific Ocean (near South America). This has implications for the conservation of the taiko. one of the world's rarest seabirds. 

Keywords : N/A 
PDF URL : N/A 

DPI : http://dx.doi.Org/10.llll/i.1755-0998.2008.02370.x 

Number of records associated : N/A 


Download: RIS End Note 


BibTex 


(Vi 3. Ancient DNA Resolves Identity and Phylogeny of New Zealand's Extinct and Living Quail (Coturnix sp.) 

Authors : Seabrook-Davison M, Huynen L, Lambert DM, Brunton DH 
lournal : PLos One; 2009; 4(7): e6400 

G 4. Mitochondrial and nuclear DNA sequences support a Cretaceous origin of Columbiformes and a dispersal-driven radiation in the Paleocene . 
Authors : Pereira.SL.Sergio L|Johnson,KP,Kevin P|Clayton,DH,Dale H|BakerAI,Allan J 
lournal : Syst. Biol.; 2007; 56(4): 656-72 

□ 5. Mitochondrial genomes from major lizard families suggest their phylogenetic relationships and ancient radiations. 

Authors : Yoshinori Kumazawa 
lournal : Gene; 2007; 388<-l): 19-26 


A publication search page illustrating navigation features 


Description of features numbered in the image above 


1 

Databases 

The Databases link provides access to the following resources that are accessible without signing in. 

Public Data Portal: A database of all of the public sequences on BOLD, including those in the early data 
release phase of the iBOL project.This database can be used to access and download the associated specimen 
data and sequences. Search by taxonomic, geographic, institution or identifier keywords. 

BIN Database: Barcode Index Numbers (BINs) are an interim taxonomic system for animals. Barcodes are 
clustered algorithmically, generating a web page for each cluster which is deposited in this database. Clusters 
show high concordance with species, which provides a fast-track for documenting diversity where taxonomic 
resources are limited. Search BINs by taxonomic, geographic, institution or identifier keywords. 

Primer Database: A searchable database of barcode primers, which includes primer statistics. Search by 
primer code, submitter name or reference keywords. 

Publication Database: A searchable, community maintained database of barcode papers linked to published 
datasets. Search by title, abstract or author keywords. 

2 

Taxonomy 

The taxonomy link provides access to the taxonomy browser, a public resource which contains a page that 
displays the images, distribution map and other details for each taxon on BOLD. Each image uploaded to BOLD 
has a license applied to it. Images may be used from the taxonomy browser if the image licensed as Creative 
Commons or No Rights Reserved, following the rules of the license. 

3 

Identification 

The identification link provides access to the animal, plant and fungal identification engines based on the COI, 
matK, rbcL, and ITS genes. This resource is available without need for a user account, though advanced features 
are available for those registered with the system. 

4 

Workbench 

The workbench link provides access to the BOLD data analysis and management workbench. After logging in, 
the initial page is the User Console. 

5 

Resources 

The resources link provides access to Site Documentation (including an online version of this handbook), 
Barcoding resources and access to data releases from Barcoding initiatives like iBOL. 

6 

Log in/out 

In the top right corner of any page, users can log in or log out. When logged in, the user’s name appears in this 
section. The gear symbol also provides access to edit account details for users who are logged in. 


Table continued on next page... 
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Table continued from previous page... 


7 

Search Bars 

In the public databases listed on the previous page, a search bar is present at the top of the page. Users can enter 
any combination of keywords to search within these databases. For example, searching “Lepidoptera Canada’’ in 
the Public Data Portal will return all of the Lepidoptera records collected in Canada. 

For further details and examples for using the search functionality, see the search help section that is available by 
clicking on the help button to the right of the search bar in each database. 

8 

Downloads 

In each database, there is also an option to download the public data returned from a search. This includes 
barcode sequences, traces, specimen data, bibliographies, and primer sequences. Distribution Maps are available 
for BINS and Public Data searches. 


Registering on BOLD 


The registration form for a BOLD account is available by clicking on the Workbench or Log In link and then selecting 
“Create Account”. After the registration is submitted, a welcome e-mail will be sent with the information needed to log in 
and begin using the BOLD workbench. Upon signing in, users will see the User Console. After users start to contribute 
data and join collaborators projects, the User Console will provide progress statistics and activity feeds (see page 22 for 
a depiction of the User Console). 

The User Console provides access for searching the data on BOLD by project code, title or tag, or by using the record 
search (see page 25 for more details on the record search within the Workbench). After gaining access to projects (either 
by creating them or being added to colleague’s projects), users can upload specimen data, images, and sequence data. 


Why register for a User Account? 

Getting an account on BOLD expands the list of options available to a user beyond access to public data and use of 
the identification engine. Users can annotate published data, to help curate and cleanse the barcode library. Moreover, 
users will be able to submit data to BOLD and gain access to other in-progress, private, projects with the permission of 
the data owners. Once records are on BOLD, a large set of analytical tools are available for validation and generation of 
reports for publication.The system will automatically check sequences submitted for Barcode Compliance and provide 
reports on records lacking pertinent details. BOLD also provides a simple mechanism to release sequences publicly, as 
well as into partner nucleotide databases such as GenBank. 


User Preferences 

User Preference options are available for logged in users to edit their account details. By clicking on the gear symbol in 
the upper right corner, users can edit their email address, institutional affiliation and name, as well as change the account 
password. The User Preferences page also provides some profile statistics for each user, such as the count of projects 
managed and participating in. 


www.boldsystems.org 
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BOLD 


Databases: Identification 

The library of sequences collected in BOLD is available for 
facilitating identification of unknown sequences. The ID engines 
use all sequences uploaded to BOLD from private, as well as public 
projects, to locate the closest match. To protect BOLD users, no 
sequence information from private records is exposed. 

Animal Identification (COI) 

The BOLD Identification System for animals accepts sequences 
from the 5’ region of the mitochondrial gene COI and returns 
a species-level identification when possible. Further validation 
with independent genetic markers is desirable in some forensic 
applications. BOLD uses the BLAST algorithm to identify single 
base indels before aligning the protein translation through profile 
to a Hidden Markov Model of the COI protein. There are four 
databases within BOLD for use in identification of COI sequences: 

1. AII Barcode Records Database includes: 

Every COI barcode record on BOLD with a minimum sequence 
length of 500bp (Warning:This is an un-validated database and includes 
records without species level identification).This includes many species 
represented by only one or two specimens, as well as all species with 
interim taxonomy.This search only returns a list of the nearest matches 
and does not provide a probability of placement to a taxon. 

2. Species Level Barcode Database includes: 

Every COI barcode record with a species level identification and a 
minimum sequence length of 500bp (Warning: This is an un-validated 
dataset). This includes many species represented by only one or two 
specimens, and all species with interim taxonomy. 



Identification Engine Results Page for batch identification 


3. Public Record Barcode Database includes: 

All published COI records from BOLD and GenBank with a minimum sequence length of 500bp. This library is a collection of records from the 
published projects section of BOLD. 


4. Full Length Barcode Database includes: 

A subset of the Species library with a minimum sequence length of 640bp and containing both public and private records.This library is intended for 
short sequence identification as it provides maximum overlap with short reads from the barcode region of COI. 


Fungal (ITS) and Plant (rbcL & matK) Identification 

In the BOLD Identification System, ITS is the default identification tool for fungal barcodes and rbcL and matK are the defaults for 
plant barcodes. Both return a species-level identification when possible. Further validation with independent genetic markers will be 
desirable in some forensic applications.The BLAST algorithm is employed in place of BOLD’s internal identification engine for these 
sequences.There are relatively few fungal and plant records on BOLD so most queries will likely not return a successful match. This 
will improve as sampling efforts continue in these kingdoms. These databases include many species represented by only one or two 
specimens, as well as all species with interim taxonomy. Both searches only return a list of the nearest matches and do not provide 
a probability of placement to a taxon. 


Fungal Database includes: 

Every ITS barcode record on BOLD with a minimum sequence length of 
lOObp (Warning:This is an un-validated database that includes records 
without species level identification). 

Plant Database includes: 

Every rbcL and matK barcode record on BOLD with a minimum sequence 
length of 500bp (Warning:This is an un-validated database that includes 
records without species level identification). 


NEW! Emailing Identification Results 

The newest version of BOLD provides the ability to 
have identification results emailed when submitting 
individual sequences or a batch of query sequences. 
Users must be signed in to utilize the emailing or 
batch identification features. 
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Databases: Taxonomy Browser 

The Taxonomy Browser is a synthetic database that allows 
users to examine the progress of DNA barcoding by browsing 
through the different levels of the taxonomic hierarchy available 
on BOLD. 

Within theTaxonomy Browser users are able to select between 
the animal, plant, fungus, and protist kingdoms and navigate 
from phylum to species level. Statistics on the progress of 
DNA barcoding at each taxon are generated from both public 
and private data while protecting private user-owned data. 

To look up a specific taxon directly, use the search function by 
entering a taxonomic name into the search bar at the top of 
theTaxonomy Browser or on the BOLD main page. 


1. Lineage 

Displays the taxon name and the higher 
taxonomic levels. 

2. Sub- 
Taxonomy 

Links to all sub-taxa with number of specimen 
records for each. 

3.Taxon 
Description 

Displays the description of this taxon from the 
Wikipedia website. 

4. Statistics 

These statistics are compiled by BOLD for this 
taxon. A species progress list can be download 
for each rank that has sub-taxa. The published 
and released sequences for this taxon in the 
Public Data Portal can be accessed from this 
section. 

5. Contributors 

Graphs depicting the institutions that provided 
the samples and sequencing for the samples. 

6. Imagery 

A random selection of the images available 
for the subtaxa of this taxon. Mousing over an 
image selects it for higher-resolution display to 
the right. 

7. Image Details 

The taxonomic identifier, the sample identifier 
and image licensing details are displayed beneath 
the image that is selected. 

8. Collection 
Sites 

A map of the collection sites for records in 
BOLD, including a list of the top countries 

9.Taxon 
Occurrence 

A map of the occurrence data for this taxon 
worldwide, streaming from the GBIF website. 


Information available at each taxonomic level in the BOLD taxonomy browser. 


SYSTEMS Databases Taxonomy Identification Workbench Resources 



BOLD taxonomy browser page for the Nymphalid butterflies 
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Databases: Publication 


The Publication Database is accessible 
from anywhere in the application by 
clicking on Databases in the header 
or Publication Database in the footer. 
This database indexes title, abstract, 
year, and authors, allowing for broad 
searches. 

Selecting a publication from the 
database will provide further details, 
including a link to the article on the 
journal’s site and access to the records 
if they are in BOLD. 


A citation or set of citations can be 
downloaded from BOLD using the 
button to the right of the search bar. 


Bibliographies can be submitted 
to this database by following the 
directions on page 21. 





Publication Database 


Databases: Primer 


The Primer Database is accessible 
anywhere in the application by clicking 
on Databases in the header or Primer 
Database in the footer. Using the 
search bar, users can search for 
terms that appear in the primer code, 
submitter or reference fields. 

Selecting a primer will provide details 
on the primer, including primer 
performance statistics derived from 
data submitted to BOLD. 

A primer or set of selected primers 
can be downloaded in FASTA format 
using the button to the right of the 
search bar. 


New primers must be 
registered with BOLD before 
trace files generated using them 
are submitted. For details on 
registering a new primer, see 
page 21. 







Log In o 

SYSTEMS 
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Identification Workbench 
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Primer Search 


[ 50 Page 


Title : internal forward primer for birds 
Marker : COI-5P 
Submitter : Kevin Kerr 

Reference : Kerr et al. (2009) Probing evolutionary patterns in Neotropical birds through DNA 
barcodes. PLoS One 4: E4379 

] 2. AvMiRI 

Title : Internal reverse primer for birds 
Marker : COI-5P 
Submitter : Kevin Kerr 

Reference : Kerr et al. (2009) Probing evolutionary patterns in Neotropical birds through DNA 
barcodes. PLoS One 4: E4379 


Marker : COI-5P 
Submitter : Kevin Kerr 
Reference : Hebert et al 2004 


Marker : COI-5P 
Submitter : Paul Hebert 
Reference : Hebert et al 2004 

] 5. BirdR2 

' TOe: 

Marker : COI-5P 
Submitter : Kevin Kerr 
Reference : Hebert et al 2004 


Marker : COI-5P 
Submitter : Kevin Kerr 
Reference : Hebert et al 2004 


Primer Database 


Download Selected Primers 

Selected Primer Data 


Primer Code(3 to 12 letters): 
Primer Name: 

Allas Codes(Comma Separated): 
Target Marker: 

Cocktail Primer: 

Primer Sequence (5' to 3'): 
Direction: 

Reference/Citation: 

Notes: 

Publication Status: 


BirdFI 

null 

null 

COI-5P 

No 

TTCTCCAACC AC AAAGAC ATTG GCAC 
F 

Hebert et al 2004 
null 

Published 


Primer Stats 

Number of Sequencing Runs: 18860 samples 



o' 

CD 

— 

a> 

o 
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Databases: Public Data Portal 


The BOLD Public Data Portal is a publicly accessible database of all of the public sequences on BOLD, including those in the early data 
release phase of the iBOL project.This database can be used to access and download the associated specimen data and sequences. 


Searching the Data Portal 

By accessing the Public Data 
Portal search from the Databases 
link in the header of the BOLD 
home page, users can search the 
public database using taxonomy, 
geography (country or state/ 
province), and institution keywords, 
BOLD Project Codes, or by using 
Sample ID or BOLD Process ID to 
find an individual record. 

Users can enter any combination 
of keywords into the search bar. 
For example, searching “Lepidoptera 
Canada ” will return all of the 
Lepidoptera records collected in 
Canada. Searching “Lepidoptera 
Canada -Ontario ” will return the 
same results with the specimens 
collected in Ontario removed. 
For further details and examples 
for using the search functionality, 
see the search help section that 
is available by clicking on the help 
button to the right of the search 
bar. 

Results 



Public Data Portal - Records Grouped by BINs 


SYSTEMS Databases Taxonomy Identificatior 


Public Data Portal - Specimen Record 


The search results will display a list of BINs or records based on the 
options selected. For more information on BIN pages, please see the next 
page. Clicking on the “Record List” will convert the result list to public 
records matched only and clicking on “BIN List” will convert the list to all 
BINs available. 

Specimen Record 

The record page gives information on the specimen identifier, taxonomy, 
specimen details, collection data (including collection site), sequence 
information, specimen image details, and attribution details.The image to 
the right shows the details page for a particular record. 

A record page will reference a BIN if one is associated and provides links 
to GenBank records. 



Data Portal - Specimen Record 


www.boldsystems.org 
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Databases: Barcode Index Numbers (BINs) 


The Barcode Index Number System is an online framework that clusters barcode sequences algorithmically, generating a web page for 
each cluster. Since clusters show high concordance with species, this system can be used to verify species identifications as well as 


document diversity when taxonomic information is lacking. 

This system consists of three parts: 

• A clustering algorithm employing graph theoretic 
methods to generate operational taxonomic units 
(OTUs) and putative species from sequence data 
without prior taxonomic information. 

• A curated registry of barcode clusters integrated with 
an online database of specimen and taxonomic data with 
support for community annotations. 

• An annotation framework that allows researchers 
to review and critique the taxonomic identifications 
associated with each BIN and notify data owners of 
errors. 

The BIN framework can greatly expedite the evaluation and 
annotation of described species and putative new ones while 
reducing the need to generate interim names, a non-trivial issue 
in barcoding datasets. The BIN algorithm has been effectively 
tested on a broad set of taxonomic groups and shows potential 
for applications in species abundance studies and environmental 
barcoding. The registry employs modern URI and web service 
functionality enabling integration with other databases. 

Ratnasingham S, Hebert PDN (2013) A DNA-Based Registry for 
All Animal SpeciesrThe Barcode Index Number (BIN) System. PLoS 
ONE 8(8): e66213. DOI: 10.1371/journal.pone.0066213 


BIN pages display aggregated data in several sections: 


1. BIN Details 

BIN details include BIN identifiers (URI and DOI), 
the member count, and distributional information. 
Also, nearest neighbour BIN details are provided 
,along with the nearest member and the taxonomy 
of that record. 

2. Taxonomy 

The taxonomy of the public data is visible for 
the BIN, with highlighting to indicate taxonomy 
concordance and discordance. NEW! For each 
taxon, logged in users can search the records that 
they have access to by clicking on the magnifying 
glass icon. 

3. Annotation 

The Add Tags & Comments button allows for 
annotation via tagging or comments on several 
aspects of the BIN page. See more details on 
Annotation on the next page. 

4. Distance 
Distribution 

A histogram plotting the within-BIN species 
distribution against the Nearest Neighbour BIN 
distribution. 

5. Associated 
Publications 

List of the publications that contain sequences from 
the BIN. 

6. Dendrogram of 
Sequences 

For BINs with 3-150 members, a circle tree is 
displayed which also includes the nearest neighbour. 
Hovering over taxon names on the circular tree 
highlights the terminal branch. A PDF version of the 
tree is available for download for all BINs with more 
than 2 members. 

Descriptions of elements 

: in BIN pages, continued on the next page. 


SYSTEMS Databases Taxonomy Identification Workbench Resources 
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Table continued from previous page... 


7. Haplotype 

NEW! 

The interactive diagram allows for investigation of 
the haplotypes in the BIN cluster along species and 
geographical splits. Hovering over a haplotype node 
in the diagram reveals details on which species or 
geographical information are grouped.The larger 
the node, the more sequences in the haplotype.The 
thicker the line between nodes, the more closely 
related those two haplotypes are. 

8. Specimen 

Images 

View images for associated records, along with 
licensing information. 

9. Sampling Sites 

Displays a map of the collection sites based on GPS 
values. 

10. Attribution 

Lists institutions where specimens are deposited, 
sequencing centres, photographers, collectors, 
taxonomists and funding sources. 

NEW! For each institution, logged in users can 
search the records that they have access to by 
clicking on the magnifying glass icon. 

Further 

Information 

Statistics are available for the collection countries 
and the data owners (not shown). 



Annotation 


As the volume of barcode data being generated increases rapidly, 
the need for routine curation has become apparent. BOLD’s 
annotation and notification system supports rapid community 
based validation of barcode data. Annotation can occur at the 
project level, record level, and also on specific data elements 
including taxonomy, images, and sequences on BIN pages. The 
annotation system leverages the large user-base and expert 
knowledge for curation of both private data within collaborative 
projects and public data through the Public Data Portal.Tagging 
allows for categorization using custom and controlled tags. Both 
custom and controlled tags can be used for filters, searches, and 
workflow management. 

Comments and tags applied to data by BOLD users will appear 
in the activity feed on the User Console and the activity feed on 
the appropriate Project Console. Comments will persist on the 
data element with the user’s full name and a date stamp.Tags can 
be removed at any time by any user. 

Annotation is available wherever the “Add Tags and Comments” 
button appears within BOLD. Users must be signed in to BOLD 
to be able to add tags and comments. 



Annotation pop-up window 


www.boldsystems.org 
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BOLD 


Specimen Data and Sequence Data Pages 

BOLD connects specimen data with sequence data in a biphasic record. Please see below for what each part consists of, as well as 
how to navigate through the pages. 


Specimen Data 

The Specimen page stores voucher details, taxonomy, specimen 
details and collection data for a specimen. Any user with 
specimen editing permissions can edit the records by selecting 
“Edit Specimen” from the upper left corner. There is a world 
map marked with the location where the specimen was 
collected if GPS data is provided. The images for the specimen 
are located at the top of the window. By selecting an image, 
users will access a zoomable version with further details. 


A 

Users can annotate specimen data,sequence data and individual 
images or traces using the “Add Tags and Comments” buttons 
on these pages. 


Sequence Data 

The Sequence page stores details about the sequence data for a 
specimen. Different markers can be accessed by clicking on the 
tabs in the blue bar. Trace files can be viewed or downloaded 
from this page. Links are provided for the appropriate ID 
Engines for Barcode sequences, which can be used to identify the 
sequence from this page. 

An illustrative barcode sequence of the species is provided 
by BOLD, along with a link to the Laboratory Information 
Management System (LIMS) for the Canadian Centre for DNA 
Barcoding when available. Also, publication details can be added 
to a record once it is published. 

Users have the ability edit sequences by directly editing the FASTA 
file, or can delete sequences if needed. NEW! In BOLD 3.6, 
users have the ability to assemble or edit assembled sequences 
using the Sequence Editor function on bidirectional trace files. 
See page 20 for further details. 


OL SYSTEMS 


Specimen - Phalonidia manniana paper [PHAMA] 

Edit Specimen 

& Print 



Taxonomy Note: 


Rank 






Dn Noncommercial ShareAlike (21 


10 Photography Group. Biodiversity Institute of Ontario 


Lepidoptera [8] 
Tortricidae [8] 
Tortricinae [8] 
Cochylimorpha [8] 
Cochylimorpha alterr 


Add Tags & Comments ^ j 


Associated Tags: No Tags 


Voucher Status: 
Tissue Descriptor: 


Sex: GEOGRAPHY 

Reproduction: Sexual Country: 


Finland 



Specimen Data Page, with image pop-up illustrating 
a zoomed-in feature 


SYSTEMS 
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Workbench: Specimen Data Submission 

The first step to creating records on BOLD is specimen data submission. Each record is assigned a BOLD Process ID when uploaded. 
After specimen data is uploaded, images, traces, and sequences can then be uploaded.There are two ways to enter records onto BOLD: 
manually through the online interface or with bulk spreadsheet submissions through the BOLD Data Managers. 

This protocol assists in the submission of bulk data to BOLD through the BOLD Data Managers.This is the easiest way to populate 
your project with records, as well as the only way to enter new taxonomy into the BOLD library. Described below are the specimen 
data fields available on BOLD. See the next page for further details. 


* Required Fields 


Sample ID * 

ID associated with the sample being sequenced (often 
identical to or an extension of Field or Museum ID). 

Field ID * 

Field number from the collection event or the specimen 
identifier from a private collection. 

Museum ID * 

Identifier for specimen assigned by formal collection upon 
accessioning; also referred to as Voucher ID 

Collection 

Code (only to 
be filled in if 
Museum ID is) 

Code associated with given collection. Used in 
conjunction with Museum ID to disambiguate a specimen 

ID number that might be used in different collections 
within the same institution. 

Institution 
Storing * 

Full name of the institution where specimen or tissue is 
stored. 


Table I: Field definitions for Voucher info page on accompanying spreadsheet. 


Sex 

Male/female/hermaphrodite only. 

Reproduction 

Sexual/asexual/cyclic parthenogen only. 

Life Stage 

The age class or life stage of the specimen(s) at the time of 
sampling. The field supports free text but is a controlled 
vocabulary for validation. Exam pie: “Adult”, “Immature”, 
“pupa”, etc 

Extra Info 

A brief note or project term associated with the specimen 
for rapid analysis. 

Notes 

General notes regarding the specimen 

Voucher Status 

Status of the specimen in an accessioning process. 

Controlled vocabulary: 

• “Museum Vouchered:Type” 

• “Museum Vouchered:Type Series” 

• “Vouchered.-Registered Collection” 

• “To BeVouchered:Holdup/Private” 

• “E-Vouchered:DNA/Tissue+Photo” 

• “Dna/TissueVouchered Only” 

• “No Specimen” 

Tissue 

Descriptor 

A brief description of the type of tissue or material 
analyzed. Example:“muscle”,“leg”,“thorax”,“liver”,“blood”, 
“feces”, etc. 

Associated 

Taxa 

A list (concatenated and separated) of taxa associated with 
the taxon at the time of its collection. References to other 
taxa should be preceded by the relationship. Use of this 
field implies knowledge of an associated specimen where 
the identification is an inference. Use of the associated 
Specimen(below) field is preferred when the associated 
specimen is databased. Examples:“host: Quercus alba”, 

“prey: caterpillar” 

Associated 

Specimens 

A list (concatenated and comma separated) of other 
specimens associated with the subject specimen at the 
time of its collection. References to other specimen 
identifiers should be preceded by the relationship. 

Examples:“host: PLANT23452, prey: USNM45677” when 
both prey and host specimens have been captured. 

External URLs 

Web accessible links that provide additional information 
about the specimen preceded by a descriptor. Multiple 
links should be pipe separated (“|”). 


Table 3: Field definitions for Specimen Details page on accompanying spreadsheet. 


Full Taxonomy 

Full taxonomy consisting of phylum* class, order, family, 
subfamily (optional), genus, species in binomial format. 

Identifier 

Full name of primary individual responsible for providing 
taxonomic identification of the specimen. 

Identifier E-mail 

E-mail address of the primary identifier. 

Identifier 

Institution 

The full name of the identifier’s institutional or 
organizational affiliation if one exists. 

Identification 

Method 

The method(s) used to identify the specimen. 

Taxonomy Notes 

Additional notes relating to the identification of the 
organism. 


Table 2: Field definitions for Taxonomy page on accompanying spreadsheet. 


Collectors 

Comma delimited list of collectors. 

Collection Date 

Date of collection, must be in DD-MMM-YYYY 

Continent 

ISO Continents 

Country/Ocean * 

The full, unabbreviated name of the country, major 
political unit, or ocean. 

State/Province 

The full, unabbreviated name of the state, province, 
territory, or prefecture within the given country. 

Region 

Park, county, district, lake or river. 

Sector 

Sector of park or county/city. 

Exact Site 

Exact location of collection site 

GPS Coordinates 

Latitude & Longitude in “degrees.decimal degrees” 
format (e.g. 45.837). 

Elevation 

Elevation of sampling site. Measured in meters relative 
to sea level. Negative values indicate a position below 
sea level. 

Depth 

For organisms collected beneath the surface of a water 
body. Measured in meters below surface of water. 

Elevation 

Precision 

A numerical representation of the precision of the 
elevation given in meters and is represented as +/- the 
elevation value. 

Depth Precision 

A numerical representation of the precision of the depth 
given in meters and is represented as +/- the depth value. 

GPS Source 

The source of the latitude and longitude measurements. 

Coordinate 

Accuracy 

A decimal representation of the precision of the 
coordinates given in the decimalLatitude and 
decimalLongitude. 

Event Time 

The time or time of day during which the sample was 
collected. 

Collection Date 
Accuracy 

A numerical representation of the precision of the 
eventDate given in days and is represented as +/- the 
eventDate value. Default is 0 days. 

Habitat 

A category or description of the habitat. 

Sampling 

Protocol 

The name of, reference to, or description of the method 
or protocol used during a collection event. 

Collection Notes 

Comments or notes about the collection event. 

Site Code 

The name of the sampling location. 

Coll. Event ID 

A optional event ID. 


Table 4: Field definitions for Collection Data page on accompanying spreadsheet. 
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As outlined below, data can be entered on the Data Submission Template spreadsheet and sent to BOLD. Data managers will review 
and validate the data, ensure that it meets the minimum requirements, and upload it into BOLD. 

1. Create Excel file submission 

• New submissions are project specific, so that their data can be associated with a project on BOLD. If records need to be 
entered into different projects on BOLD, a separate excel file for each project needs to be created. BOLD supports the upload 
of multiple specimen records in a spreadsheet format. 

• The data spreadsheet consists of 4 worksheets; a main specimen identifier worksheet (voucher info) that is linked to three 
other worksheets: taxonomy, specimen details, and collection data.Tables 1-4 describes the information accepted in the batch 
specimen data submission. Minimal information can be submitted to start and records can be updated at a later date. Figures I -4 
below illustrates example data filled into the accepted fields for Template 3.0 (This spreadsheet template is available from the 
online version of this protocol in the Resources tab, or at: http://www.boldsystems.org/submissionTemplates/SpecimenData_ 
v3Transitional.xls.) 

• The minimal requirements for a new specimen record on BOLD are: 

• Voucher Info Page - Sample ID 

• Voucher Info Page - Field ID and/or Museum ID 

• Voucher Info Page - Institution Storing 

• Taxonomy Page - Phylum 

• Collection Page - Country 

2. Submit file to BOLD for processing 

• Open the destination project in BOLD 

• Click on “Specimen Data” under the Uploads menu and choose “Initiate Batch Submission”, and select “New” for the submission 
type.This option is available to project managers and project users with edit specimen access. 

• In the form, select the Excel file to submit to this project, along with email addresses for collaborators that should be cc’d 
on further communications regarding the submission, a priority level and note if needed. Then click “submit” to submit the 
spreadsheet for the first pass of validation. 

• If there are any errors detected with the first pass of a validation, please resolve these in the submission and re-submit. 

• The Data Management team will contact you if there are any issues during validation, and once the records have been uploaded 
to your project. 


Specimen Info 

Sample ID 

Field ID 

Museum ID 

Collection Code 

Institution Storing 

demoO 1 

Sample-demoOl 

15466-JUC-ISC 

ISC 

Burke Museum 



Taxonomy 

Sample 

ID 

Phylum 

Class 

Order 

Family 

Subfamily 

Genus 

Species 

Identifier 

Identifier 

Email 

Identifier 

Institution 

Identification 

Method 

Taxonomy 

Notes 

demoO 1 

Arthropoda 

Insecta 

Diptera 

Asilidae 

Hydro- 

psychinae 

Efferia 

Efferia 

aestuans 

Joe Smith 

jsmith@ 

BIO.org 

Oxford 

Morphology 




Specimen Details 

Sample 

Sex 

Repro¬ 

Life 

Extra 

Notes 

Voucher 

Tissue 

Associated 

Associated 

External 

ID 

duction 

Stage 

Info 

Status 

Descriptor 

Taxa 

Specimens 

URLs 

demoO 1 

Female 

Sexual 

Adult 

Region 1 

Collected with 

vouchered: 

leg 

Predator: 

Predator: 

www.burke.edu/ 

predator 

registered collection 

Hornet 

BITK002-12 

mus/spec 15466 



Collection Info 

Sample 

ID 

Collectors 

Collection 

Date 

Continent/ 

Ocean 

Country 

State/ 

Province 

Region 

Sector 

Exact Site 

Latitude 

Longitude 

Elevation 


demoO 1 

Joe Smith 

2-Jul-2009 

North 

America 

Canada 

Ontario 

Wellington 

Guelph 

Riverside 

Park 

43.563 

-80.270 

325m 


Depth 

Elevation 

Precision 

Depth 

Precision 

GPS 

Source 

Coordinate 

Accuracy 

Event 

Time 

Collection 

Date Accuracy 

Habitat 

Sampling 

Protocol 

Collection 

Notes 

Site 

Code 

Collection 
Event ID 


2m 


Garmin 

1 m 

morning 

2 

dry forest 

Malaise 

park entrance 

#14 

#M872a 


Figures 1-4: Example Specimen Data Submission. 
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Here are some important notes on fields for new or update submissions: 

Sample IDs (Voucher Page): 

• It is important to use a unique and original format for the Sample IDs. If the Sample IDs provided are not original on BOLD, 
they will need to be changed before the data can go online. 

• Only the following characters may be used in the Sample ID, Field ID, and Museum ID: Numbers, letters, and A .:-_() # All 
other characters will be removed. 

Collection Code (Voucher Page): 

• The Collection Code must be used in conjunction with Museum ID in order to disambiguate a ID number that might be used 
in different collections within the same institution (i.e., a specimen number in a large museum may appear in the bird, mammal 
and reptile collections).This field is only to be used if Museum ID field is used. 

Interim Species Names (Taxonomy Page): 

• Interim names should contain non-Linnean characters such as numbers, punctuation and/or extra capitalization.Taxonomists 
are encouraged to append interim names with initials. (Example: Morpho sp. I KHR) 

Extra Info (Specimen Details Page): 

• The “Extra Info” field can be displayed on a Taxon ID Tree on BOLD and thus it is possible to include information that may aid 
in analysis when illustrated on a tree. 

Collection Date and Accuracy (Collection Page): 

• If there is a date range, the Collection Date should be the mid-point, and the +/- can be entered into the Accuracy field. 


What are Projects and Process IDs? 

All of the data in BOLD are organized by projects. Related projects can be grouped into containers or temporarily merged with 
related projects for analysis, etc. 

An individual entry in the database represents a barcode of a given specimen. The Process ID (assigned by BOLD upon specimen 
data record upload) uniquely represents a sample in BOLD. This is the identifier that is used to track a sample through the 
barcoding process: collection, taxonomic identification, sequencing, analysis and final publication of data. 


Updating Specimen Data 

An update means to modify records that already exist in a project. To only update one or two records, please manually select the 
specimen from the species record listing in your project and click on the “edit” button in the upper right corner. Any details can be 
edited in this way, except for adding new taxonomy to BOLD. 

The quickest way to update a large number of records is to download and revise the Data Spreadsheet from BOLD. To do so: 

1. Click on “Data Spreadsheets” from the Downloads menu on the left side of your project, merged projects or record search. 
(Please note: Records from any number of projects can be updated in one submission spreadsheet, and the number of records 
are (in theory) infinite for this type of update.) 

2. Only download the worksheets that will be affected by the update (e.g. if the taxonomy needs to be updated, only download 
theTaxonomy worksheet; if specimen details and collection data need to be updated, only download the Specimen Details and 
Collection Data worksheets, etc. Please do not download and submit updates on the progress report.) 

3. Modify the data on the downloaded worksheets.The submitted update must reflect what the data should be on BOLD. 

4. Please send this to the Data Management Team through submissions@boldsystems.org noting the scope of the update, or 
submit through BOLD if it only affects one project. 


If Sample IDs need to be changed after uploaded to BOLD, please contact the support team through support@boldsystems.org. 
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Workbench: Image Submission 


Images should be uploaded to BOLD to 
complete a specimen record. An image 
provides support for identifications and 
makes comparisons easier between 
species. 


This protocol outlines the image 
submission process for BOLD. It describes 
the necessary format of the images and 
the ancillary data and the steps required 
to build the uploadable package required 
for a successful submission. 


1. Collect Images: 

Group high-quality images of specimens 
in .jpg format for your records. BOLD 
accepts high resolution images (up 
to 20 megapixels), but only displays a 
greatly reduced thumbnail. The high 
resolution image is archived but will not 
be distributed without the submitter’s 
explicit consent. Refer to page 18 for a 
guide on picture orientation and quality. 

2. Assemble Package: 

The image submission package should 
consist of all .jpg format images and a 
spreadsheet with the file names and 
ancillary data. Make sure that all images 
in the package are accounted for in the 
spreadsheet. When submitting more than 
one image per specimen simply copy the ‘Sample ID’ to the next line with the file name of the consecutive image. 

You can upload up to 10 images per specimen, depending on organism characteristics. Please photograph several different orientations 
if needed. 

The submission spreadsheet must be named lmageData.xls and contain the columns described in the table above.This spreadsheet 
template is available from the online version of this protocol in the Resources tab on BOLD, or at: http://www.boldsystems.org/ 
submissionTemplates/lmageData.xIs. 

Steps: 

A. Fill in the lmageData.xls spreadsheet with all the data related to the images in the submission package. To easily create the list of 
image files in a folder, open a terminal window (Start > Run > cmd in Windows), navigate to the folder containing the image files, and 
run one of the following commands: 

• Windows dir /b *.jpg>list.txt 

• MacOS Is *.jpg*-JPG>list.txt 

• Linux/Unix Is *.jpg*.JPG>list.txt 

These commands will generate a list of all the files in the current folder and save it in a document called ‘list.txt’ that will appear in the 
current folder. You can then open list.txt and move the data into the Image File column. Please see the next page for an example of 
the completed lmageData.xls spreadsheet and further steps to upload. 


Image File * 

Complete (incl. extension) and identical file name (case sensitive) of images. 

Original 

Specimen * 

Enter “Yes” if the image shows the actual specimen for this record. Otherwise 
enter “No”. 

View 

Metadata * 

Controlled vocabulary term to group media depicting a specific set of 
features of the organism or related environment. Dorsal, Lateral,Ventral, etc. 

Caption 

Free text description of the subject. Short descriptions are recommended, 
such as: part of organism photographed, life stage, sex, etc. (400 Characters) 

Measurement 

Any single relevant measurement that was taken in metric units. 

Measurement 

Type 

Item or feature that was measured. 

Sample ID * 

Sample ID for record, which must match Sample ID in BOLD. 

Process ID 

Process ID for record (not necessary). 

License 

Holder* 

The primary individual holder of the license. This is less critical when using 
creative commons licenses. 

License* 

Pick one of the following license types or short-forms: 

• Copyright 

• No Rights Reserved 

• CreativeCommons-Attribution 

• CreativeCommons - Attribution Share-Alike 

• CreativeCommons - Attribution No Derivatives 

• CreativeCommons-Attribution Non-Commercial 

• CreativeCommons - Attribution Non-Commercial Share-Alike 

• CreativeCommons -Attribution Non-Commercial No Derivatives 

(or) 

c 

nrr 

by 

by-sa 

by-nd 

by-nc 

by-nc-sa 

by-nc-nd 

License Year* 

The year of license declaration (not the year of submission to BOLD). 

License 

Institution* 

The primary license holder’s institutional or organizational affiliation. 
Decisions regarding use of material falls to the institution when the individual 
is unreachable or unresponsive. 

License 

Contact* 

Contact information for the license holder. Can be an email address, mailing 
address, phone number, or all of the above. 

Photographer 

The individual or team responsible for photographing and editing the media 
prior to submission. 


Field definitions for accompanying image submission spreadsheet. * Required Fields 
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Image File 

Original 

Specimen 

View 

Metadata 

Caption 

Measure¬ 

ment 

Measure¬ 
ment Type 

Sample Id 

Process Id 

License 

Holder 

License 

License 

Year 

License 

Institution 

License 

Contact 

Photo¬ 

grapher 

ROM912D.jpg 

yes 

Dorsal 

skull 

15 mm 

skull length 

ROM 10912 

BMI272-03 

Jane Beck 

Copyright 

2010 

Biodiversity 

Institute 

photo@bio.org 

Joe Smith 

ROM9l2L.jpg 

yes 

Lateral 

lower jaw 

7 mm 

length 

ROM 10912 

BMI272-03 

Jane Beck 

Copyright 

2010 

Biodiversity 

Institute 

photo@bio.org 

Joe Smith 

ROM9l3L.jpg 

yes 

Lateral 

skull 

15 mm 

skull length 

ROM 10913 

BMI273-03 

Jane Beck 

Copyright 

2010 

Biodiversity 

Institute 

photo@bio.org 

Joe Smith 


Image Submission Spreadsheet (lmageData.xls) completed with sample data. 


Tips and Troubleshooting for Image Uploads 

• Zipped files must be under 190MB in size. If the upload fails to 
initialize, the zipped file may be too large. Break it into more than 
one upload, each with its own spreadsheet. 

• Select the images and spreadsheet and zip those directly. Zipping 
the containing folder only may not work. 

• The spreadsheet cannot contain any formulas. 

• If the upload program cannot find the image files, it is possibly 
because it can not read the names. Make sure that the spreadsheet 
contains text values only. 

• Full filenames must be used in the spreadsheet.The extension (.jpg 
or .JPG) must be included in the image file name, which is case 
sensitive. 

• Spreadsheet must be named lmageData.xls. If the upload program 
can not find the spreadsheet, confirm that it is named correctly 
(case sensitive). 

• Data must start on the second line of the spreadsheet. There is only 
one line for the column headers. 

• Adding extra columns to the sheet will cause errors. 

• Image names can not contain the characters or Please 
rename your images so that they do not have these characters. 

You can upload more images in separate batches to any record at 
any time. If you wish to delete images for a record, please contact 
the BOLD Support Team through support@boldsystems.org. 

Image Licensing and Use 

BOLD assumes no license for images uploaded to the database.The image owner maintains the license and may change the license 
on the images at any time. Revisions to the given license should move towards making the license more liberal over time as it is very 
difficult to retract an open license and make it more restrictive. 

If no license is chosen for an image, by default BOLD will forward all requests for that image to the owner for response. Adding a 
license reduces that burden and makes access easier. BOLD encourages the use of CreativeCommons - Attribution Non-Commercial 
Share-Alike, as this license has a good balance of protection and access. 

There are three reasons for having images uploaded for each specimen on the BOLD database. 

1. Quality Assurance - images can be used to confirm the taxonomic identification of organisms during sequence analysis. 

2. Peer Review/Quality Assurance - once records are made public, peers can utilize your images and sequences to assist in their own validation 
of related specimens. 

3. Taxonomy BrowserTaxon Profiles - a random selection of the images on BOLD for each taxon are displayed on the public Taxonomy 
Browser at a highly reduced size (320 x 240) to create an online profile for each taxon that is stored in BOLD. 


B. These two components (Image files and Spreadsheet) need 
to be placed in a single folder. Compress them all into a single 
file before submitting. The following free tools are available 
to provide this functionality, however, most modern operating 
systems have built-in functionality for zipping: 

» WinZip - http://www.winzip.com 
» WinRar - http://www.rarsoft.com 
» MacZipIt - http://www.maczipit.com 

C. BOLD will accept a maximum zipped file size of 190 MB. 
Upload the images to BOLD by clicking on the “Images” link in 
the Uploads menu of the User Console. Select the zipped folder 
of images to submit to BOLD.The images will appear immediately 
on the records once the upload is complete. 


www.boldsystems.org 


17 

























BOLD 


Photography Guide 

This guide has been developed with assistance from the Canadian Centre for DNA Barcoding in an effort to provide best practices for 
taking voucher photographs.The BOLD database can accept up to ten images per specimen, so besides photos of a mounted or live 
organism, photos of distinguishing features or habitat are also acceptable. 

To provide the best specimen image for viewing on the web, the following guidelines should be adhered to when appropriate: 

• Please take pictures using the high quality mode on your camera (please note that BOLD can accept up to 20MP photos). 

• The specimen should be centered in the image frame. 

• Photos should be taken as close-up to the specimen as possible, leaving very little gap around the edges. 

• It is often beneficial to include a measurement scale in the image to provide a size reference or a colour scale to provide colour reference. 

• Background should be a contrasting colour where possible 

• Standardizing the aspect ratio during specimen photography for your project(s) will make your images easier to compare 

• Standardizing the position/orientation of each specimen will make it much easier to compare specimens across a project or BOLD. 


See below for some common standardized orientations for animals, plants and fungi. 





Dorsal 

• The anterior of the specimen should be 
facing the top of the image frame. 

• The specimen should be face-down, with 
the dorsal aspect of the head visible. 



Lateral 

• The anterior of the specimen should be 
facing the left side of the image frame. 

• The specimen should be oriented with 
the feet towards the bottom of the image. 




~ r ^: 

n 

•T 

m 

w 

J«4. 





Ventral 

• The anterior of the specimen should be 
facing the top of the image frame. 

• The specimen should be face-up, with the 
ventral aspect of the head visible. 



Herbarium Sheets: 

• The full sheet should be included 
in the frame 

• The sheet should be oriented so 
that text is legible in photo 

• Label as “Herbarium Specimen” 


Specific Features: 

• These are often live 
photographs with focus on 
distinguishing characteristics 

• For plants, these may include 
opened fruit, adaxial veins, male 
and female components 

• Macro focus is recommended if 
available 

• Label as “Leaf”,“Fruit”,“Buds”, 
“Bark”, etc 


Whole Specimens: 

• These are often live 
photographs of the organism in 
it’s natural location 

• The specimen should be 
centered in the frame and 
provide information such as 
shape of plant, etc 

• If shooting outside, ideal 
environment has no wind and 
is slightly overcast (to avoid 
overexposure) 

• Label as “in situ”,“Plant”, etc 


Suggested orientations for photographs. 


When entering a new orientation, please capitalize only the first letter and do not add any words (such as “Ventral view”). If your specimen does not fit into any of these 
categories, then please feel free to create a new category of view. (Displayed Specimen ImagesrAII Rights Reserved) 
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Workbench: Trace Submission 

Trace files provide support for sequences and should be uploaded 
for every specimen record.They can be uploaded once the data 
submission step is completed and BOLD has assigned a Process 
ID to each record. This protocol assists in the submission of 
trace files to BOLD. It describes the necessary format of the files 
and the ancillary data that is required for the correct submission. 


I.Confirm primers are registered on BOLD. 

See page 8 for details on how to search the primer database to 
confirm your primer is registered with BOLD. 


2. Assemble Package: 

The submission package consists of trace files (.abl or .scf), 
corresponding Phred (score) files if available (.phd. I) and a spreadsheet with the file names and ancillary data. The submission 
spreadsheet must be named data.xls and contain the columns described in the table to the right.This spreadsheet template is available 
from the online version of this protocol in the Resources tab, 
or at: http://www.boldsystems.org/submissionTemplates/data.xls 

Steps: 

A. Fill in the data.xls sheet with all the data about your files. To 
easily create the list of the files in a folder, you need to open a 
terminal window (Start > Run > cmd in Windows), navigate to 
the folder where the trace and score files have been placed and 
run one set of the following commands: 

• Windows dir /b * ab I >ab I .txt and dir /b * phd. I >phd.txt 

• MacOS Is * ab I >ab I .txt and Is *.phd. I > phd.txt 

• Linux/Unix Is *.ab I >ab I .txt andls*.phd.l > phd.txt 
These commands will generate lists of all the files in the current 
folder. They will be saved as ab I .txt and phd.txt text files. You can 
then open the text files and move the data into the appropriate 
columns, as illustrated below. 

B. These components (Trace files, Score files and Spreadsheet) 
need to be placed in a single folder. Compress them all into a 
single zipped file before submitting. 

C. BOLD will accept a maximum file size of 190MB. Upload 
the traces to BOLD by clicking on the link “Trace Files’’ in the 
Uploads menu on the User Console. Select the zipped folder of 
files, and choose the institution that performed the sequencing 
to submit to BOLD. 


Tips and Troubleshooting For Trace Uploads 

• To obtain the Process IDs for a set of records on BOLD, click on “Data 
Spreadsheets” under the Downloads menu on the left side of a project console. 
Download the Progress Report (Downloads>Specimen Data) to get the Process 
IDs that are assigned to each Sample ID submitted to BOLD. 

• Primers must be registered before upload. If the primers are not registered, 
there will be an error. 

• Zipped file must be under 190MB in size. If the upload fails to initialize, it is 
probably because the zipped file is too large. Try breaking it into more than one 
upload, each with its own spreadsheet. 

• Select the traces and spreadsheet and zip those directly. Zipping the 
containing folder only may not work. 

• The spreadsheet cannot contain any formulas. 

• If the upload program can not find the files, it is possibly because it can not 
read the names. Make sure that you have text values only in the spreadsheet. 

• Full filenames must be used in spreadsheet. The extension (.ab I, .scf, .phd. I) 
must be included in the file name.These extensions are case sensitive. 

• The spreadsheet must be named data.xls. If the upload program can not find 
the spreadsheet, confirm that it is named correctly (case sensitive). 

• Data must start on the second line of the spreadsheet. There is only one line 
for the column headers. 

• Do not add extra columns to the spreadsheet. 

• Trace files may not be downloaded or viewed BOLD until processed (up to 
24 hours after they have been submitted). 


Trace File * 

Complete (including extension) and identical file 
name (case sensitive). 

Score File 

Complete (including extension) and identical file 
name (case sensitive). 

PCR Primers 
Fwd/Rev * 

Primer codes are case sensitive. 

Both must be filled in. 

Sequence Primer 

Primer codes are case sensitive. 

Read Direction * 

Forward or Reverse. 

Process ID * 

Process ID of record, which must match Process 
ID in BOLD. 

Marker 

(2 blank columns 
must be left after the 
Process ID column) 

If sequencing multiple genes, the marker needs to 
be filled in to match the short form marker code 
in your project, such as one of the following: 
COI-5P, ITS, rbcLa, matK 


Field definitions for accompanying trace submission spreadsheet. 

* Required Fields 


Trace File 

Score File 

PCR 

Fwd 

PCR 

Rev 

Seq 

Primer 

Read 

Direction 

Process ID 

blank 

blank 

Marker 

KKBNAOO1 -04.ab 1 

KKBNAOO l-04.phd.l 

BirdFI 

BirdRI 

BirdFI 

Forward 

KKBNAOO 1-04 



COI-5P 

KKBNAOO1 -04r.ab 1 

KKBNAOO l-04r.phd.l 

BirdFI 

BirdRI 

BirdRI 

Reverse 

KKBNAOO 1-04 



COI-5P 

KKBNA002-04.ab 1 

KKBNA002-04.phd. 1 

BirdFI 

BirdRI 

BirdFI 

Forward 

KKBNA002-04 



COI-5P 


Trace File Submission Spreadsheet (data.xls) completed with sample data. 

You can upload more traces in separate batches to any record at any time. If you wish to delete any traces for a record, please contact 
the BOLD Support Team through support@boldsystems.org. 
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Workbench: Sequence Submission 

This protocol outlines the DNA sequence submission process on 
BOLD, describing the sequence format and steps required for a 
successful submission. Only users with sequence editing access 
on a project may upload sequences. 


1. Assemble Package: 

The sequence submission package should consist of aligned 
sequences in FASTA format referenced by BOLD Process IDs or 
Sample IDs. 

To upload with Process IDs,the FASTA header line must conform 
to the following format: it should begin with a ‘>’ followed by the 
Process ID, with any additional information separated by either 
a bar (‘|’), an underscore or a space (‘ ’). There can be no 
spaces before the end of the Process ID. 

To upload with Sample IDs, the FASTA header line must conform 
to the following format: it should begin with a ‘>’, followed by the 
Sample ID, with any additional information separated by a bar (‘|’). 

Do not use a space or an underscore to separate information 
from the Sample ID. 

2. Upload Package: 

You can include up to 1000 sequences into one upload. Upload the sequences to BOLD by clicking on “Sequences” in the Uploads 
menu of the User Console. Select the marker and the institution that assembled the sequences. Paste the sequences into the text box. 
When confirmed, “submit” to upload the sequences. These will appear immediately on the records. 

• If you wish to replace a sequence on BOLD, simply upload the new one with the same Process ID or Sample ID. 

• To delete an individual sequence, you can do so by using the Delete button within a record’s sequence data page (for more info on Sequence 
pages, please see page 12). Contact the BOLD support team through support@boldsystems.org for batch deletions. 


SYSTEMS 


Sequence Submission 


Required fields for submission are marked with a red asterisk. 

Select id: ( Procaesid 

Markers: f CCW-6P 14^ 

Run site *: | Biodiversity Institute of Ontario 

To select an institution, begin to type in the name. BOLD will return matches in a drop down box and clicking on a name will select it. 

Select institution responsible for generating the sequences being uploaded. If the institution is 
not listed, please register it by clicking on the button below 


Add New Institution 


PASTE SEQUENCES IN FASTA FORMAT: 


>BOLDXC01-12 a 

AACATTGTACCTAATTTTTGGGGCTTGAGCAGCTATACTAGGAACAGCACTTAG 

GGTAAT CATT CGGGCTGAGTAGGTCAACCAGGCT CATTAATT GGGAAT GACCA 

AATTTATAATACAATTGTGACGGCCCATGCCTTTATTATAATTTTCTTTATGGTTAT 

ACCTATCATAATCGGAGGATTCGGGAACTGGTTAATTCCTTTGATACTCGGGGC 

CCCTGATATGGCTTTTCCCCGGATAAATAACATAAGATTTTGGCTTCTTCCTCCT , 

TCATTAACTCTTCTTATAATTGGGATACTAACAGAGGGGGGGGCCGGTACAGGT 

TGAACAGTATATCCTCCCCTATCAAGGAATATTTCTCACTCCGGAGCAAGGGTT 

GAT CTCACCATTTTTT C ACTTC ATTTAGC AGGT GCTAGCT CTATTCTAGGAGCTA 

TTAACTTTATCACTACCATTATTAATATACGAGCAGCAAGAATATCCCTAGACCGA 

ATTCC 

>BOLDX002-12 

TACTTTGTATTTAATTTTTGGAGCTTGATCAGCTATAGTTGGCACTGCTTTAAGA - 
GTAATTATTCGAGCGGAACTTGGTCAACCAAGGTCACTTATTGGAAACGATCAA * 
ATCTATAACACTATTGTCACTGCTCATGCATTTGTTATAATTTTCTTCATAGTGATA ▼ 
CCGATCCTGATCGGGGGCTTCGGTAACTGGCTAGTTCCTCTTATGCTTGGGGC 


Submit Cancel 


Sequence Upload Form 


Workbench: Sequence Editor 

NEW! The online sequence editor is a new feature in BOLD 
version 3.6. This tool provides the ability to assemble contig 
sequences from good quality bi-directional trace files. Further, 
this tool allows users to edit base calls, and upload the sequence 
directly to the record. 

Once trace files have been uploaded to a BOLD record, a user 
with sequence editing permission can select the traces and choose 
the Sequence Editor button from the Sequence Page (see page 12 
for an illustration of this page). 

The Sequence Editor application, illustrated to the right, aligns the 
forward and reverse traces and generates the contig sequence. 
Base calls may be inspected by hovering over a peak, which 
highlights the corresponding peak and nucleotide. The quality of 
each peak is disseminated by the colour gradient right above the 
nucleotide bases, with green representing a high quality call and 
red representing a high degree of ambiguity. Users can choose 
to edit base calls by clicking on a nucleotide which opens the edit 
base dialog. The save button in the top right corner allows the 
user to save the sequence to the BOLD record. 



Sequence Editor 
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Workbench: Primer Submission 

New primers need to be registered on BOLD prior to 
submitting a trace file package. To register new primers, select 
“Register Primers” from the User Console. Please note: If a 
primer sequence has already been registered under a different 
code, you will be provided with the registered code to be used 
in your submission. Primers you register on BOLD can be 
edited at any time after they are created (e.g. to make them 
public). 


Primer Code 

Create a code for the primer. If the primer is 
already published in a manuscript, please use 
the code that is in press. 

Primer Description 

A description of what the primer is used for. 

Alias Codes 

Other known code names for the primer, 
separated by commas. 

Target Marker 

Select the target marker from the controlled 
list (e.g. ITS, COI 5’, matK, etc.). 

Cocktail Primer 

Select “Yes” if it is a cocktail primer. This will 
create extra fields to add multiple sequences. 

Primer Sequence 

Fill in the sequence(s), 5’ to 3’. 

Direction 

Select the direction of the sequence. 

Reference/Citation 

List references and/or citations. 

Notes 

Any notes about the primer. 

Publicly Available 

If the primer has already been published, or 
if you wish to make it publicly available, this 
should be left public. The other option is to 
keep the primer private until publication. 


Field definitions for Primer Submission. 


Workbench: Bibliography Submission 

Users can submit bibliographies to BOLD using the Bibliography 
Submission Form available in three locations: 

• From the User Console under Data Uploads 

• From a Project Console or Dataset Console 

• From a list of searched records (Record List page) 

Any user with edit sequence or edit specimen permissions to 
records will have the ability to submit a bibliography connected 
to those records as primary or secondary associations. 

The primary and secondary GenBank accessions can be filled 
in here, separated by a line.These accessions will auto-populate 
in the form for the records selected if the submission is from 
the Project Console or Record List. (See page 27 for details on 
submitting records for GenBank Accessions via BOLD) 

The publication details will then appear in the Publication 
Database, and the citation will appear on each record. 


SYSTEMS 


Primer Submission 


View Primers 

Required fields for submission are marked with a red asterisk. 


Primer Code (3 to 12 letters) *: 
Primer Description: 

Alias Codes (Comma Separated): 
Target Marker *: 

Cocktail Primer: 

Primer Sequence (5' to 3") *: 
Direction *: 

Reference/Citation *: 


CAS18sF1 


Forward primer for ribosomal internal transc 


f Its 


3 


( No ^ A cocktail is a standard combination of multiple 
primers using one reaction. 

TACACACCGCCCGT CGCTACTA 
Direction ~T) 


Ji. Y.-J., D.-X. Zhang, and L.-J. He. 2003. 
Evolutionary conservation and versatility 
of a new set of primers for amplifying the 
ribosomal internal transcribed spacer 


Publicly Available: 


Make This Primer Publicly Available J 1 


Note: Primer sequences should not contain spaces. Any space characters found will automatically be stripped 
before submission. 


BOLD Primer Submission Form 


SYSTEMS 


Bibliography Submission - WEAI 


Required fields for submission are marked with a red asterisk. 

Article Tttle *: 


Authors *: 

Journal *: 
Year*: 
Volume *: 
Issue *: 
Pages *: 


PubMed Central Id: 
Open Access: 


Aquatic Insects of Canada Survey Example Publication 


Molecular Ecology Resources 


http J/onlinel ibrary. wiley. com/dot/10. 1 111 /j. 1 365-294X.20 1 2.05612.x/pdf 


10.1271/’joumal.pone.00365821 


nT 

n> 

Cl 

C 

u 

n 

PT 


Abstract *: 

Date Published *: 
Date Revised: 
Language: 


Associated Records: 
(GenBank Acc. only) 


Lorem ipsum dolor srt amet, consecteturadipiscmg of it. Nulla dignissim. 
dolor sed vehicula luctus. neque lorem pulvinar mi. sed u trices risus dui 
ullamcorpereros. Nam aliquet, arcu nec ultrices aliquam, quam libero 
dcifend ante, id condimentum nulla risus quis ipsum. Sed porttitor, nulla 
a imperdiet commodo. nisi quam molestie lectus. at euismod ante orci 


2012-06-22 (YYYY-MM-DD) 

2012-06-12 (YYYY-MM-DD) 


DNA Barcoding, COI. Insects, Aquatic 


Primary Secondary 

(Records generated by authors) (Records cited by authors) 


JQ663116 



JQ662101 

u 


JQ661553 



JQ662120 

A 


JQ661757 

▼ 


JQ663013 

//. 

_^ 


2! Confirm that primary records were generated by authors 
Submit Cancel 


BOLD Publication Submission Form 
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Workbench: User Console 


The BOLD User Console is the landing page for users who have logged in. This console allows for rapid access to frequently accessed 
projects, near real-time reports on project activity as well as powerful search tools. 


Data Uploads 

Upload sequences, traces, images, primers, 
and bibliographies directly from user 
console. 

Real-time Activity Reporting 

BOLD maintains a detailed log of all the ac¬ 
tions taken by users in the system. This log 
is transformed into a report where events 
pertinent to each user are extracted and 
displayed on this console. As users work 
together on projects, submitting and refin¬ 
ing data, the Activity Report allows them to 
stay informed on the steps taken by their 
colleagues. Logs can be downloaded, al¬ 
lowing users to keep personal records and 
perform additional analysis. Users can also 
filter recent activity for a particular user or 
action by using the search field at the top 
of the Most Recent Activities panel. 

Searching 

The newest version of BOLD introduces 
an integrated search bar accessible from 
the User Console, Project Console, and 
Project List. It includes a project search, 
where a user can jump directly to a project 
by entering the code in the project search 
field. If the code is not known, the user can 
generate a short list by entering a project 
tag or part of the project title. 

The second search functionality, available 
by clicking on Records: “Search”, opens the 
workbench Record Search interface, which 
allows users to search for records based 
on search terms consisting of geography, 
taxonomy, tags, sequence length, and pasted 
lists of identifiers. Records retrieved from 
a search can be downloaded or analyzed 
right on the system. See the next page for 
further details on the Record Search for the 
workbench. 

Datasets 

The User Console provides access to the 
Dataset framework, including the options 
to create new Datasets and view existing 
public datasets. For more details, please see 
page 26. 



User Console for an example user 


NEW! In the Project Search bar (available on the User 
Console, Project Console and Record List), users can search 
for multiple projects at once by entering more than one 
project code, separated by commas. For example, searching 
“NE0S/,LHS/V1/” will merge those two projects together and 
provide a consolidated project console. 
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Workbench: Record Search 

The record search interface in the BOLD workbench allows for rapid access to large numbers of records by searching across a number 
of parameters. 

To search across all records that a user has access to, select the “Search Records” button in the User Console or Project List. The 
form also gives the option to add in all public records on BOLD. Within a Project Console or Record List page, the search allows users 
to choose from searching within the project, or across the entire workbench. 

NEW! The newest version of BOLD includes several new search parameters, including specimen depository,collection date ranges,, 
and GPS-based polygons. 

The results from the search will appear in a Record List like the one illustrated on page 25. 


Taxonomy 

Searches specific taxonomic names. Use 
quotes to surround Species name and 
separate multiple taxa by spaces. Example: 

“Bos taurus”“Bos indicus” 

Geography 

Searches the country and province names. Use 
quotation marks to surround multi-word names 
and separate multiple terms by spaces. Example: 
“Costa Rica” Mexico “United States” 

Marker and 
Sequence Length 

Filter results by the presence of a sequence for a 
specific marker, as well as defined length range. 

Tags 

Enter terms found in tags on records 

Depository 

Accepts depository institution names. 

Extra Info 

Accepts terms that appear in the “Extra Info” field 
on specimen records. 

Collection Dates 

Allows for a date range to be specified. 

Region of Map 

Searches for records with GPS coordinates, which 
are found within a user-defined polygon area on 
the map. A polygon can consist of many points but 
it must contain at least 3 points on the map and 
be close-looped.The returned results will have all 
of the records which have GPS coordinates within 
the polygon and to which the user has access to. 

List of Identifiers 
(2nd Tab) 

Search by Sample IDs, Process IDs, GenBank 
accessions and BIN URIs. Can be used in 
conjunction with the above fields to return the 
intersection of records. 

Search within 
project 

If accessed from within a project, the record search 
will remain restricted to that project unless the 
checkbox is unchecked. 

Include public 
records 

Checking the box at the bottom will allow the 
search to go across all public data on BOLD, as 
well as the records in projects the user has direct 
access to. 


Search Parameters Available 



Record Search window, showing a polygon defining the boundaries of a 
National Park. 


www.boldsystems.org 


23 




























BOLD 


Workbench: Creating a project 

To upload Barcode records to BOLD, a project must first be created in which to house the records. From the User Console, select 
the ‘New Project’ button from the Project Management menu. 

All project details can be edited at any time (with the exception of Project Code, Project Type and Project Manager) by simply clicking 
on ‘Modify Project Properties’ in the Project Options menu of the Project Console. Only the Project Manger is able to modify these 
details. 


Project Title* 

Please create a descriptive name. 

Project Code* 

A 3-5 letter code that needs to be unique across 
BOLD. A good approach is to use initials and 2 or 

3 other letters as an acronym for the title. 

Project Type* 

Choose between the following options: 

• Data Project (contains records) 

• Folder Project (contains other projects) 

Primary Marker* 

Select the primary marker. Primary marker 
options are: CO 1-5P, rbcL, matK, ITS, 18S-3P, 18S- 
V4, trnH-psbA 

Supporting 

Markers 

Select as many secondary markers as needed from 
the list of registered markers. 

Campaign 

Select the name of the campaign the project is 
part of if desired. 

Place in 

Container 

Select the name of the Folder Project if desired. 

Tags 

User-defined or controlled tags may be selected. 

Project 

Description* 

Enter a summary of the use and intention of the 
project. 15 - 500 characters. 

Bounding Box 

Define the bounding box of the collection area 
covered by the project using GPS coordinates. 

Project Access 

Check to make project publicly visible on BOLD 
and submit to the BOLD Public Data Portal. 

Project Manager 

The person who creates a project is automatically 
the Project Manager. 

Assign Users 

Other BOLD users can be added to a project. 
Different access levels are described below. 


Sequence Access: 

• Analyze Only - user can perform analysis 
on the data, but cannot view more than a 
summary of the data (sequence and related 
information remain hidden). 

• View & Download - user can view or 
download the sequence data, as well as 
analyze. 

• Edit Sequences - user can upload trace files, 
upload, edit and delete sequences, as well as 
view and analyze. 


Specimen Access: 

• Edit Specimens - user has control over 

sample identifiers, taxonomy, collection 
data, and images of the specimens: this edit 
permission level is intended for project 
managers, collectors, and taxonomists. 

NEW! 

Dataset Access 

• Add to Dataset - users can add records to 

datasets, which can be used for analysis or 
publication purposes. 


Field definitions for BOLD project creation form. * Required Fields 



New Project Form 


Please note that the person who creates a project is 
automatically assigned as the Project Manager. To change 
the project manager, the current Project Manager must send 
a request to the BOLD support staff through support@ 
boldsystems.org. 


Supporting markers are added upon request. If a marker 
required is not on the list, please contact BOLD support 
staff to register one through support@boldsystems.org. 
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Workbench: Managing Data 

Once a project has been populated with the specimen data, images, 
traces and sequences, it will resemble the figure on the right. 

Project Console 

The project console presents an overview of the status of records 
within the projects as well as an audit trail of the activities in 
the project. This includes a report of the number of specimens, 
along with tallies of any missing components of the records. Also 
included are graphs to provide a quick visual overview of the 
project, as well as a list of all the users with access to the project. 
The links on the left provide access to uploads, downloads and 
various analysis tools. 

Project Managers will see the “Modify Project Properties” button 
with which they can change the project title and description, add 
or remove markers, and add, remove or modify permissions of 
users at any time.The Project Manager also has access to publish 
the records in the project to GenBank. (See page 27 for more 
details on GenBank submissions.) 

To access the records within each project, click on “View All 
Records” in the project options menu. NEW! The Project Search 
and Record Search options appear in the search bar at the top. 


Record List 

A Project Record List is the full list of all records within a project, 
along with the actions and tools open to a project member. 

The record list gives access to individual specimen and sequence 
data for each record. The arrows along the column headers can 
be used to sort the records in ascending or descending order by 
title. Users can select specific records for download, analysis, or 
annotation using the checkboxes or project data options. 

NEW! The Home Project button appears after a record search, 
and displays the Project Code column, to help users find the home 
project for a record. 

Flags 

• Icons appearing next to a record indicate the presence of certain 
characteristics of a record; see legend to the right for more 
details. 

• If a sequence contains more than \% ambiguity and won’t 
meet the Barcode Standard, the ambiguous character count is 
highlighted in red. 

Users with full permissions can move records to another project 
or add them to a dataset by selecting the records needed and 
then clicking on the appropriate link in the Options menu. The 
destination projects and datasets that will appear in the list will be 
ones in which the user has full permissions. 

Click on the Sample ID or the Process ID to access the Specimen 
Data and Sequence Data respectively, for each record. These are 
illustrated on page 12. 



Project Console 


SYSTEMS Databases Taxonomy Identification Workbench Resources 


Record List - Indo-Pacific reef fish community assembly [IPCOM] 
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0 Thalassoma lunare 


Record List 


ton] [On] 
[Or] [In] 
[On] [On] 
[On] [On] 
[On] 0 


® G3 0 
‘ ® @3 S3 IB 
®©E3 
® 

® ©03 
® ©B3 
® (3 m 
® 003 IIS 
® ©E3 
® ®B3 


.■vTC". 

GPS coordinates present for record 

m 

Images present for specimen 


The number of traces present 

* 

Stop codons present in sequence 

X 

Contamination present in sequence 

0 

Flagged record, not in ID engine 

IIIIKr 

Sequence is Barcode Complaint 


BOLD Record List icon legend 
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Workbench: Datasets and DOIs 

Managing data with projects on BOLD is very efficient for coordinating sequencing activities and digitization of specimen records. 
Projects allow for the effective management of permissions by ensuring that any single record can exist only in one project. Datasets 
extend this functionality by allowing the creation of named collections of records by taxonomic, geographic or thematic criteria. 

Any user can create datasets and add records they have Dataset Level access to. A record may appear in multiple datasets, but is still 
linked to the original record. Changes made to the original record are automatically synchronized with those in datasets. 

A single dataset can hold up to 25,000 records, for assembly of large collections for publication in GenBank or public release. 

In addition to providing accessions through GenBank for records in a dataset, BOLD, through a partnership with DataCite, can provide 
a single DOI for each dataset.The assigned DOI can be used in associated manuscripts in place of supplementary tables and can be 
referenced when used by others. 

Creating A New Dataset 

Once logged into BOLD, select the New Dataset button in “Your Datasets’’ section of the User Console. By choosing to make the 
dataset public, the user is provided with the opportunity to request a DOI. 


SYSTEMS Databases | Taxonomy Identification | Workbench | Resources 


New Dataset Submission 


Required fields for submission are marked with a red asterisk. 
Dataset Title *: New specjes of Canadian Plusiinae 


Dataset Code *: 
Dataset Description *: 


DS- PLUSCA (A unique 4-8 character alphanumeric code) 


A selection of records utilized in an upcoming 
manuscript of a new species of Rlu$jind£ in Canada 


(15 characters minimum) 


Dataset Access: pubicly visible 

(All BOLD users will be allowed to view, analyze, and download sequence data) 


Bounding Box: 


Top Left: 


Lat: Q Lon: 

(For geographically oriontod datasets) 


Bottom Right: 

Lat: 1 Lon: [ 


Save Cancel 


The BOLD Dataset creation form with DOI request shown. 


Publish Confirmation 


Vo u have elected to make this dataset public. Please confirm that you want 
to proceed. 

If public, this dataset will be accessible in the BOLD Public Data Portal 
within 48 hours. The records in this dataset can be retrieved using the 
Following permanent URL: http://www.boldsvstems.org/lndex.php 
/ Pui bl ic_Seanc hTe r ms?q u e ry= < DAT.AS ET COD E>. 

BOLD now supports minting of Digital Object Identifiers (DOI)* For 
datasets so that they can easily be used in publications and subsequently 
cited. Please request a DOI if this feature would be useful. 

Request DOI 

* DOis are persistent Identifiers used to identify and cite publications but 
are a/so very useful in citing data direct fy. 


Proceed Cancel 

A 


Adding And Removing Records 

Users can search for and add records to a dataset in batches of up to 2,500 records at a time from a record list page. 

NEW! The ability to remove records from a Dataset is now available in the Record List for a particular dataset. Users can select 
individual records to remove from the dataset. For further assistance with BOLD Datasets, please email support through support@ 
boldsystems.org. 
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Workbench: Publication of Records on BOLD and GenBank 


Publication through BOLD 

To submit records to the BOLD Public Data Portal, the Project Manager can make a Project or Dataset public. This option is available 
in the Project Properties form which the Project Manager can access by clicking on “Modify Project Properties” within a Project or 
Dataset. 

Once GenBank accessions have been obtained, users can submit a bibliography to the BOLD database following the directions on page 
21. This will associate the publication with the records using the GenBank Accession numbers. This citation will appear on the records 
in the workbench and the Public Data Portal. 


Publication to GenBank, NCBI 

BOLD shares a tightly integrated data exchange pipeline with NCBI (GenBank) that allows for the automatic submission of data 
to GenBank. Users are only required to fill in the author and publication information and which is sent to GenBank along with the 
specimen, sequence, and trace data which has been transformed to the required formats. GenBank responds directly to the user with 
the accessions for their records to be included in publications. Accessions are also sent to BOLD to ensure bidirectional linkage. 

The data exchange pipeline is further utilized to send GenBank updates to records. Identifications of records submitted through 
BOLD to GenBank can still be refined and updated as new information is obtained. Changes to the taxonomy of BOLD records are 
automatically sent to GenBank on a weekly basis so that GenBank has the most current and up to date information. 


To submit a set of sequences to GenBank, the Project Manager can 
access the form shown to the right via the “Submit to GenBank” 
button within a Project or Dataset Console. 

All records within the project will be submitted to GenBank. If 
only a subset of records need to go to GenBank, then the records 
should be moved to a new project or copied to a BOLD Dataset 
which can be submitted. 

GenBank accession numbers are generally returned by email to 
the corresponding author within five business days. The accession 
numbers will be associated with the records on BOLD once they 
are available. 

Submissions to GenBank from BOLD are automatically locked 
for I year to allow time for publication. If the publication is 
released sooner than one year, the corresponding author should 
contact GenBank directly to request public release at the time of 
publication. 

Contact the BOLD support team through support@boldsystems. 
org for assistance with any aspect of publication. 


NEW! To access the list of GenBank accessions for 
a set of records on BOLD, download the Progress 
Report spreadsheet from the Downloads Menu, under 
Data Spreadsheet. Alternatively, select the “Summary 
- Specimens, Localities, and GenBank” within a Project 
Console or Record List, to view accessions. 



GenBank Submission Form * Required Fields 
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Workbench: Analytical Tools 

BOLD includes core and extended tools to analyze specimen and sequence data: 


Core Analysis Tools 

• Image Library: Compare morphological characteristics 

• Distribution Maps: Interact with geographical data 

• Taxon ID Tree:Visualize a neighbour joining tree with 
matching images 

• Barcode Index Numbers (BINs): Barcode clusters (see 
page 10) 

• Identification Engine: Locate closest matches to an 
unknown sequence (see page 6) 

Extended Analysis Tools 

• Distance Summary: Browse sequence divergence at 
multiple taxonomic levels 

• Sequence Composition: Explore compositional variation 
at all codon positions 

• Barcode Gap Analysis: Evaluate the Barcode gap 

• Accumulation Curve: Review sampling efficiency 

• Alignment Browser: Diagnose unaligned sequences 

• Diagnostic Characters: Examine polymorphism 

• BIN Discordance Report: Utilize BINs to highlight 
possible issues 


Tip: Run Multiple Analytic Tools in Parallel 

Use the email analysis option to run multiple analyses and 
have the results emailed when the analysis is finished. 
Results can be stored for up to 4 weeks, saved for future 
comparison, and links to the results can be shared between 
collaborators. 

Find this option on the parameters page for most analysis 
tools. 


Workbench: Image Library 

Once images have been uploaded to a project, it is possible 
to view them in two ways.The first is by opening an individual 
record where any corresponding images will be displayed below 
the specimen data.The second is via the Image Library in order 
to view a group of specimens, as shown below. 

The Image Library displays one image per specimen and allows 
users to choose a specimen orientation view in order to 
compare morphological differences between specimens. This 
tool is useful for diagnosing contamination or misidentifications 
as taxonomy is displayed below each image. 

NEW! The specimens can be displayed sorted by identification 
or BIN. In the general options, users can also select how many 
images to display per row and per page. 

Image Licensing is viewable upon mouse rollover. To view the 
attribution and further details on specific images, open the 
specimen data page from the record list. 



Image Library for Lepidoptera (Displayed Specimen Images: Copyright 2010, 
Bavarian State Collection of Zoology) 
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Workbench: Distribution Map Analysis 

The Distribution Map plot the collection points for a selected set of specimens when geographic reference data is available.There are 
two versions available for use, including the Interactive Map which is new in BOLD version 3.6. 

NEW! Interactive Map 


GPS points are clustered on the map in coloured circles,and single 
records are signified by a blue marker. The circles indicate that 
there are multiple GPS points clustered there - hovering over 
these circles will display the bounds of the cluster. Green circles 
indicate that there is more than I point, yellow circles have more 
than 10 points, and orange circles have more than 100 points. 

The square symbol in the upper right corner of the map is a 
search tool that allows users to draw a rectangle on the map to 
select all the records that fall within that region.The records can 
then be searched on BOLD in order to allow users to download 
or run further analysis on that subset of records. It also allows 
users to edit records to quickly correct GPS points, if they are 
found to be incorrect. 


Users can zoom in out or on the map using their mouse scroll 
wheel, or with the plus/minus buttons provided in the upper left 
hand side. Clicking on one of the GPS cluster markers will zoom 
in to that cluster. 

The Interactive Map is built on the open-access platform 
OpenStreetMap and therefore can be published, re-used, and 
modified at the user’s discretion. 


Google Earth 

The Google Earth map is a display of the specimen collection 
points in the program Google Earth.This is free software that can 
be downloaded from the web. Google gives permission for use 
in publications as long as the Google logo remains on the image. 


The Google Earth map is a portable KML file download which can 
be shared among colleagues. It is available by clicking on the “Open 
in Google Earth” link at the top of the Interactive Map.This file 
has embedded specimen images, along with specimen identifiers, 
country, province/state, institution/collection information, and the 
Extra Info field. 


SYSTEMS 




Google Earth Map featuring BOLD records 
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Workbench: Taxon ID Tree 


SYSTEMS 


The Taxon ID Tree functionality allows for the generation of 
dendrograms from sequencing using the Neighbour Joining 
algorithm. Sequence alignment and multiple labelling options are 
available. This tool can be accessed by clicking on “Taxon IDTree” 
under the sequence analysis panel on the Project Console and 
Record List pages. 

Result of the analysis include a tree in PDF format for easy 
dissemination and publication, Newick format for rendering in 
other formats and layouts (i.e. circular), and a taxonomy report 
that provides a breakdown of the data used. When the option to 
include matching images and spreadsheet is selected, the system 
returns a spreadsheet with specimen identifiers and full taxonomy 
in the same order the records appear in the tree as well as the 
images of the specimens in the same order as the tree. Utilizing 
these options help catch data entry and lab errors where samples 
may be mixed up. They are also useful for detecting misidentified 
specimens. 


NEW! BOLD now provides the ability to built larger 
Taxon ID trees, up to 12,000 COI sequences. For other 
markers and unaligned sequences, trees are limited to 
3,000 sequences. 


Sequence Data 

Nucleotide or Amino Acid for coding markers 

Distance Model 

Kimura 2 Parameter, Jukes Cantor, Pairwise 

Distance distance models are available. 

Tree Building 
Method 

Neighbour Joining is the only method at this time. 

Marker 

Select the marker from a multi-gene dataset. 

Alignment 

Options 

Users can utilize the BOLD aligner (amino-acid 
based HMM), Kalign, MUSCLE or use unaligned 
sequences 

Terminal Branch 
Labels 

Many options for labels to add to the end of each 
branch including taxonomy, geography, identifiers, 
sequence details, BIN identifiers 

Photographs 

Option to include matching specimen photographs 
and spreadsheet for comparison. 

Filters 

Can be applied to disregard sequences below 
a given length (since very short sequences can 
skew the results), omit sequences with more than 
1% ambiguous bases, or to exclude problematic 
sequences. 

Colour 

By default, problematic sequences are highlighted 
(blue for stop codons and red for contaminations 
or user-applied flags). Other colourization 
options include: taxonomy, geography, extra info, 
sequence age and BINs. 

Ambiguous Base/ 
Gap Handling 

Complete or Pairwise Deletion option. 

Result Options 

Choose to view the results immediately or to 
have the results emailed when completed. 

NEW! Page 
Orientation 

Trees can now be displayed in landscape 
orientation. 


Parameters available for Taxon IDTree 


Taxon ID Tree - IPCOM 


Sequence Data: 
Distance Model: 

Tree Building Method: 
Marker: 

Align Sequences *: 
Select Terminal Branch 


Geography: 


Apply Filters: 


Colorize Tree Based on: 
Ambiguous Base/Gap Handling: 


Result Options: 


Page Orientation: 


[ Nucleotide ; ] 

Kimura 2 Parameter i | 
Neighbor Joining 


COI-SP - Cytochrome Oxicaso Subamt 1 5' Region : 


BOLD Aligner (Amino Acid based HMM) 
O Extra info 


O Phylum © Family 

O Class O Subfamily 

O Order O Tribe 

O No Taxonomy 

O Country & Provinca/State 

O Sector 

© Region 

O Exact Site 

Ono geography 


□ Collection Date 


□ Sex/Gender 


O Specimen Sample ID 


O Field Number/Isolate 


O Voucher Number/Musoum Number 


□ Sequence/Process ID 


0 Include Sequence Length in label 


O Include GC Composition 


OBINURI ©No BIN details 



Q Matching specimen photographs and spreadsheet 


Sequence Length > 200 bp i | 

Cl Exclude Contaminants 

Cl Exclude Records With Stop Codons 

Cl Exclude Records Flagged as Misidentifications or errors 


Highlight Problematic Records : 

© Pairwise Deletion Complete Deletion 


Email me when the results are available 


u 


Store the results for | 3 Days 


I if 


© Portrait O Landscape 


Taxon IDTree Parameter Page 



Dyspyralis nigellus [118] PXG1CB-C3 TETA1.CC Kcctuidae 
Zale galbanara [119' FM3166-03 :noth242.Cl Noctuldae 
■"Zale unillneata [l2o; 

:era albosigma [1*1] : 

:lostera aplcalls [12: 

•Datana minlstra [123] PKG173-03 DATA1.CC Kotodontldae 

Furcula borealis [124] PKG175-03;mothS76.Cl Kotodontldae 
-Furcula -odesta [125' F343177-03 FUSCl. 00 I Notodontldae 
Furcula occidentalls [12‘6] PXG176-03 FURC2.00 Kotodontldae 
' sla llntnerl [1271 PMG176-C3 aothl56.011Notodontldae 
Kadaca glbbosa [128; FWG183-03 moth665.01 Notodontldae 
— Kotodonta slaplaria [129' PMG184-03 moth206.Cl Notodontldae 
■Perldea basltrlens [130] PKG186-03 itothlllO.Ol Notodontldae 

Odontosla elegans [131] PKG185-03 aoth749.01 Notodontldae 
■Pheosla rlaosa [132] PKG137-03 rtoth979.01 Kotodontldae 

■ 

"copula llreboundataI[134] : PKG066-03ISCOP1.0C Geometrldae 

■Pachysphlnx modesta [135] PKG194-03 itoth304.Cl Sphlngldae 
Paonias excaecata [136]iPMG195-03laoth466.011Sphlngldae 
Paonlas xyopsl[1371 PXG196-03 aoth687.01 Sphlngldae 
rlnthus cerisyl [1381 PMG197-D3 n.oth442.01 Sphlngldae 
•Smerlnthus ja~alcensls [l.'5J PKG198-C3 »oth354.01 Sphlngldae 
•Deldaala lnscrlptuz. [14C] PXG192-03 lr.oth!63.01 Sphlngldae 

-- r.j - • r..- ■ "n « *- r. - -'-- 


Sphecodlna abbottll [141 


PMS199-03 :noth250. 01 Sphlngldae 
■cerator.ia undulosa (1421 PMG191-03 nothl091. 

— Lapara bombycoldes [14V PMG193-03 mDth.3c8.Cl Sphlngldae 


Lapara bombycoldes [143' PMG193-03 moth488. 01 1 Sphlngldae 
— "Sphinx canadensis [i44] PXG200-03 SPHINX1.0D sphlngldae 
-Sphinx poeclla [1451 FMG201-03 moth296.0: 

Blston betularla [146] PMG023-03 noth322.01 Geoaetrldae 

— 

Euchlaena JohnsonarlaI[148' FMG034-03IEUCHLAE1.00 Georetrldae 
-Lytrosls unitarlaI[14$] ! PXGC52-03lc:oth993.011Geometrldae 

-lophora pendulinarla I[1501 PMG031-03Iroothi96.C1 Geoaetrldae 

_ 


Standard Taxon IDTree (Lepidoptera) 
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Workbench: Distance Summary 

It is desirable for barcodes to show very low sequence 
divergence within a species, with significantly higher 
sequence divergence at higher taxonomic levels. The 
Distance Summary tool gives a report of sequence 
divergence between barcode sequences at the conspecific 
and congeneric levels. 

Distance values are calculated using a user selected distance 
metric. Comparisons are performed between the given 
taxonomic levels with the frequency plotted as shown in 
the figure to the right. Details for the comparisons done 
at the levels of species, genus and family are available by 
clicking on the links in the top right corner of the Distance 
Summary results. 

NEW! Complete Deletion 

BOLD 3.6 includes the option to analyze sequences 
using Complete Deletion for handling ambiguous 
bases and gaps. This is available in the Distance 
Summary, Barcode Gap and Alignment Browser. 



Distance Summary Analysis 


Workbench: Sequence Composition 

The frequency of DNA bases, observed with emphasis 
on GC-content, can be a useful metric for evolutionary 
biologists. GC-content within the barcoding region of 
CO I has been correlated with GC-content of the entire 
mitochondrial genome for many species. 

The Sequence Composition tool allows the user to view 
the frequency of each base, G, C,A andT, as well as graphics 
for GC content on all codon positions. This information 
includes overall sequence composition^ well as for codon 
positions 1,2, and 3. 


SYSTEMS 


Sequence Composition Result - IPCOM 


The Sequence Composition report calculates the frequency of nucleotides tor input sequences. 


Summary statistics for nucleotide frequency distribution are provided in the table below. 



Sequence Composition Analysis 
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Workbench: Barcode Gap Analysis 

The Barcode Gap analysis presents users with an examination 
of the distance to the nearest neighbour for each of the 
species in the list of specimens. Distances are highlighted if 
the nearest neighbour is less than 2% divergent, or when the 
distance is less than the intra-specific distance. 


Tip: Graphs for Publication 

When the “Expand” icon (shown to the 
right) appears next to a graph, the graph is 
expandable for a better quality version that 
can be used in publications. 
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Barcode Gap Analysis 


Workbench: Accumulation Curve 

An accumulation curve of standardized DNA barcodes 
and related features provides a clear, transparent and 
reproducible estimate of the diversity and sampling efficiency 
of areas or collections. 

This tool also allows users to quickly compare sampling 
efficiency at multiple regions by multiple taxonomic levels. 


SYSTEMS 


Accumulation Curve Result - BARG 


Accumulation Curve Analysis 

Number of Records: 1594 

Sampling Order: Random Sampling (20 iterations) 

Curves Based On: Barcode Cluster. Species 

Graphs Grouped By: Dataset segregated by geo.region 


Graphs: 58 

Curves Per Graph: 2 



Accumulation Curve Analysis 
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Workbench: Alignment Browser 

Managing sequence alignments and base calls are a critical 
step in any barcode analysis. BOLD provides an integrated 
alignment browser that allows users to analyze and edit 
sequences without needing to import them into a 3rd party 
software. 

NEW! The updated alignment browser supports direct 
editing to the database. Users can edit sequences, then 
upload the edited sequences to their records. 
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Alignment Browser Result - WWGSL 

& Print 

Fie Edi Select View Formet Colour Calculate Help 

_ _ 370 380 

\VW287-08:AmericorchestiaJon%cornis AGGAGCAGGAACAGGATGAACTGT AT^ 
WW28S-08:AmericorchestiaJongicornis AGGAGCAGGAACACGATGAACTGTAT/ 
WWCSL0S4-08:Themisto libellula 

WW846-08:Syscenus infelix ■C|q|HA^BBiA|AHK 

WW847-08:Sy5cenus_infelix GGGCGCTGGGACTGGGTGAACAGTTT^ 

\VW132-07:SyscenusJnfelix GGG C GC TGGG ACT CCGTG A AC A GTTT / 

390 400 410 420 430 

^CCCCCCTCTCGCTGCCTC CACAGC CCAT AGAGGAGGCTCAGTAGAC CT C 
^CCCCCCTCTCGCTGCCTCCACAGCCCATAGAGGAGGCTCACTAGACCfc 
^CCCCCCTCTCGCTGCCTCCACAGCCCATAGAGGAGGCTCAGTAfSAC Cg C 
lCCCCCCTCTCCCTG€CTtCCACAicCCATAGAGGAGGCTCAteiA|ACCic 
^CCCTCCTCTATCTTCTTCTATAGGCCACAGAGGTTCTGCTGTTGATGTT 

VCCCACCCCTCTCTGCTCAATTAGCTCACAGAGGTCCTTCTGTAGATCTG 
^CCCACCCCTCTCTGCTCAATTAGCTCACAGAGGTCCTTCTGTAGATCTG A 
kCCCACCCCTCTCTGCTCAAlNftAmClaEACAfiAGGTC CTTCTGTAGATCTG 

ACCCCCACCCACTCCATCAACTCTATACCCCCCTCTCTCTCCTTCCATACCCCACACACCTCCTTCTCT AGAT CTC 

Sequence 6 ID: WWGSL292-08:Themisto_libellula Nucleotide: Adenine (312) 

Submit Sequences to BOLD 


Alignment Browser 
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Workbench: Diagnostic Characters 

The Diagnostic Character analysis provides a means to 
examine nucleotide or amino acid polymorphism between 
sets of sequences that are grouped by taxonomic or 
geographic labels. More specifically, this tool identifies 
consensus bases from each group, compares them to 
those from the remaining sequences in other groups, then 
characterizes how unique each consensus base is.The tool 
categorizes consensus bases by their diagnostic potential. 

Changes in the input sequences, filtering options or the 
analysis parameters can affect the consensus sequences in 
each group and the diagnostic potential will be different 
between analyses. In general, having more sequences per 
group will provide a more accurate diagnosis of each group, 
as it reduces the problem caused by small sample size. 



Diagnostic Characters Analysis 


Workbench: BIN Discordance Report 

The Barcode Index Number module analyzes new COI 
sequences and assigns them to an existing or a new BIN. 
Besides generating BIN pages, this system acts as a rapid 
check of the validity of taxonomic designation on specimen 
records. The BIN Discordance report facilitates this check 
by comparing the taxonomy on selected records against all 
others in the BINs they are associated with. 

The results are sorted by the degree of conflict, displaying 
those records in BINs where there is a phylum level conflict 
first (likely the result of cross-contamination) down to 
species level conflicts. Users can pull up records from this 
page to examine ancillary data or to edit the taxonomy 
where there is an error. 
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BIN Discordance Report - WWGSL 


Overview 


The Barcode Index Number system clusters COI sequence data into OTUs (called BINs) independent of prior taxonomic assignment. As such, it provides a means of 
confirming the concordance between barcode sequence clusters and species designations. This report performs this validation by comparing the taxonomy on input 
records against all others in the same BINs, including those submitted and managed by other users. 


Data Selected: 460 Records 458 Records with BINs Representing 86 BINs 

Breakdown of Data: 

Taxonomically Discordant: 15 BINs 77 Records 

Taxonomically Concordant: 70BINs 380 Records 

Singletons: 1 BINs 1 Records 

Detailed Results 

Discordant BINs are sorted by rank of discordance (phylum level conflicts at the top and species level at the bottom). Concordant and singleton BINs are 
separated into secondary tabs. Member records for a BIN can be retrieved by selecting the checkboxes next to Process IDs of interest and clicking “Record 
Search". 



BIN Discordance Report 
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